126 research outputs found

    Structured Bayesian methods for splicing analysis in RNA-seq data

    Get PDF
    In most eukaryotes, alternative splicing is an important regulatory mechanism of gene expression that results in a single gene coding for multiple protein isoforms, thus largely increases the diversity of the proteome. RNA-seq is widely used for genome-wide splicing isoform quantification, and several effective and powerful methods have been developed for splicing analysis with RNA-seq data. However, it remains problematic for genes with low coverages or large number of isoforms. These difficulties may in principle be ameliorated by exploiting correlations encoded in the structured data sources. This thesis contributes to developments of Bayesian methods for splicing analysis by leveraging additional information in multiple datasets with structured prior distributions. First, we developed DICEseq, the first isoform quantification method tailored to time-series RNA-seq experiments. DICEseq explicitly models the correlations between experiments at different time points to aid the quantification of isoforms across experiments. Numerical experiments on both simulated and real datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Second, we developed BRIE (Bayesian Regression for Isoform Estimation), a Bayesian hierarchical model which resolves the difficulties in splicing analysis in single-cell RNA-seq (scRNA-seq) data by learning an informative prior distribution from sequence features. This method combines the quantification and imputation for splicing analysis via a Bayesian way, which is particularly useful in scRNA-seq data due to its extreme low coverages and high technical noises. We validated BRIE on several scRNA-seq data sets, showing that BRIE yields reproducible estimates of exon inclusion ratios in single cells. Third, we provided an effective tool by using Bayes factor to sensitively detect differential splicing between different single cells. When applying BRIE to a few real datasets, we found interesting heterogeneity patterns in splicing events across cell population, for example alternative exons in DNMT3B. In summary, this thesis proposes structured Bayesian methods to integrate multiple datasets to improve splicing analysis and study its biological functions

    Statistical modeling of isoform splicing dynamics from RNA-seq time series data

    Get PDF
    Isoform quantification is an important goal of RNA-seq experiments, yet it remains prob- lematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming in- creasingly popular, yet there are no methods that explicitly leverage the experimental design to improve isoform quantification. Here we present DICEseq, the first isoform quantification method tailored to correlated RNA-seq experiments. DICEseq explicitly models the corre- lations between different RNA-seq experiments to aid the quantification of isoforms across experiments. Numerical experiments on simulated data sets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. On real data sets, our results show that DICEseq provides substan- tially more reproducible and robust quantifications, increasing the correlation of estimates from replicate data sets by up to 10% on genes with low or moderate expression levels (bot- tom third of all genes). Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Our results have strong implications for the design of RNA-seq ex- periments, and offer a novel tool for improved analysis of such data sets. Python code is freely available at http://diceseq.sf.net

    Screening Spin Lattice Interaction Using Deep Learning Approach

    Full text link
    Atomic simulations hold significant value in clarifying crucial matters such as phase transitions and energy transport in materials science. Their success stems from the presence of potential energy functions capable of accurately depicting the relationship between system energy and lattice changes. In magnetic materials, two atomic scale degrees of freedom come into play: the lattice and the magnetic moment. Nonetheless, precisely portraying the interaction energy and its impact on lattice and spin-driving forces, such as atomic force and magnetic torque, remains a formidable task in the computational domain. Consequently, there is no atomic-scale approach capable of elucidating the evolution of lattice and spin at the same time in magnetic materials. Addressing this knowledge deficit, we present DeepSPIN, a versatile approach that generates high-precision predictive models of energy, atomic forces, and magnetic torque in magnetic systems. This is achieved by integrating first-principles calculations of magnetic excited states with advanced deep learning techniques via active learning. We thoroughly explore the methodology, accuracy, and scalability of our proposed model in this paper. Our technique adeptly connects first-principles computations and atomic-scale simulations of magnetic materials. This synergy presents opportunities to utilize these calculations in devising and tackling theoretical and practical obstacles concerning magnetic materials.Comment: 8 pages, 4 figure

    Transcriptome-wide RNA processing kinetics revealed using extremely short 4tU labeling

    Get PDF
    Background: RNA levels detected at steady state are the consequence of multiple dynamic processes within the cell. In addition to synthesis and decay, transcripts undergo processing. Metabolic tagging with a nucleotide analog is one way of determining the relative contributions of synthesis, decay and conversion processes globally. Results: By improving 4-thiouracil labeling of RNA in Saccharomyces cerevisiae we were able to isolate RNA produced during as little as 1 minute, allowing the detection of nascent pervasive transcription. Nascent RNA labeled for 1.5, 2.5 or 5 minutes was isolated and analyzed by reverse transcriptase-quantitative polymerase chain reaction and RNA sequencing. High kinetic resolution enabled detection and analysis of short-lived non-coding RNAs as well as intron-containing pre-mRNAs in wild-type yeast. From these data we measured the relative stability of pre-mRNA species with different high turnover rates and investigated potential correlations with sequence features. Conclusions: Our analysis of non-coding RNAs reveals a highly significant association between non-coding RNA stability, transcript length and predicted secondary structure. Our quantitative analysis of the kinetics of pre-mRNA splicing in yeast reveals that ribosomal protein transcripts are more efficiently spliced if they contain intron secondary structures that are predicted to be less stable. These data, in combination with previous results, indicate that there is an optimal range of stability of intron secondary structures that allows for rapid splicing

    CRISPR/Cas9‐mediated somatic correction of a novel coagulator factor IX gene mutation ameliorates hemophilia in mouse

    Get PDF
    The X‐linked genetic bleeding disorder caused by deficiency of coagulator factor IX, hemophilia B, is a disease ideally suited for gene therapy with genome editing technology. Here, we identify a family with hemophilia B carrying a novel mutation, Y371D, in the human F9 gene. The CRISPR/Cas9 system was used to generate distinct genetically modified mouse models and confirmed that the novel Y371D mutation resulted in a more severe hemophilia B phenotype than the previously identified Y371S mutation. To develop therapeutic strategies targeting this mutation, we subsequently compared naked DNA constructs versus adenoviral vectors to deliver Cas9 components targeting the F9 Y371D mutation in adult mice. After treatment, hemophilia B mice receiving naked DNA constructs exhibited correction of over 0.56% of F9 alleles in hepatocytes, which was sufficient to restore hemostasis. In contrast, the adenoviral delivery system resulted in a higher corrective efficiency but no therapeutic effects due to severe hepatic toxicity. Our studies suggest that CRISPR/Cas‐mediated in situ genome editing could be a feasible therapeutic strategy for human hereditary diseases, although an efficient and clinically relevant delivery system is required for further clinical studies

    BRIE: transcriptome-wide splicing quantication in single cells

    Get PDF
    Abstract Single-cell RNA-seq (scRNA-seq) provides a comprehensive measurement of stochasticity in transcription, but the limitations of the technology have prevented its application to dissect variability in RNA processing events such as splicing. Here, we present BRIE (Bayesian regression for isoform estimation), a Bayesian hierarchical model that resolves these problems by learning an informative prior distribution from sequence features. We show that BRIE yields reproducible estimates of exon inclusion ratios in single cells and provides an effective tool for differential isoform quantification between scRNA-seq data sets. BRIE, therefore, expands the scope of scRNA-seq experiments to probe the stochasticity of RNA processing

    Hyperoxemia and excess oxygen use in early acute respiratory distress syndrome : Insights from the LUNG SAFE study

    Get PDF
    Publisher Copyright: © 2020 The Author(s). Copyright: Copyright 2020 Elsevier B.V., All rights reserved.Background: Concerns exist regarding the prevalence and impact of unnecessary oxygen use in patients with acute respiratory distress syndrome (ARDS). We examined this issue in patients with ARDS enrolled in the Large observational study to UNderstand the Global impact of Severe Acute respiratory FailurE (LUNG SAFE) study. Methods: In this secondary analysis of the LUNG SAFE study, we wished to determine the prevalence and the outcomes associated with hyperoxemia on day 1, sustained hyperoxemia, and excessive oxygen use in patients with early ARDS. Patients who fulfilled criteria of ARDS on day 1 and day 2 of acute hypoxemic respiratory failure were categorized based on the presence of hyperoxemia (PaO2 > 100 mmHg) on day 1, sustained (i.e., present on day 1 and day 2) hyperoxemia, or excessive oxygen use (FIO2 ≥ 0.60 during hyperoxemia). Results: Of 2005 patients that met the inclusion criteria, 131 (6.5%) were hypoxemic (PaO2 < 55 mmHg), 607 (30%) had hyperoxemia on day 1, and 250 (12%) had sustained hyperoxemia. Excess FIO2 use occurred in 400 (66%) out of 607 patients with hyperoxemia. Excess FIO2 use decreased from day 1 to day 2 of ARDS, with most hyperoxemic patients on day 2 receiving relatively low FIO2. Multivariate analyses found no independent relationship between day 1 hyperoxemia, sustained hyperoxemia, or excess FIO2 use and adverse clinical outcomes. Mortality was 42% in patients with excess FIO2 use, compared to 39% in a propensity-matched sample of normoxemic (PaO2 55-100 mmHg) patients (P = 0.47). Conclusions: Hyperoxemia and excess oxygen use are both prevalent in early ARDS but are most often non-sustained. No relationship was found between hyperoxemia or excessive oxygen use and patient outcome in this cohort. Trial registration: LUNG-SAFE is registered with ClinicalTrials.gov, NCT02010073publishersversionPeer reviewe
    corecore